Density-Based Multiscale Data Condensation
نویسندگان
چکیده
ÐA problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity. Index TermsÐData mining, multiscale condensation, scalability, density estimation, convergence in probability, instance learning.
منابع مشابه
Computing Initial points using Density Based Multiscale Data Condensation for Clustering Categorical data
The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with categorical attributes. K-Mode clustering algorithm suffers from the drawback of choosing random selection of initial points (modes) of the cluster. Different initial points leads to different cluster formations. In this paper Density-based Multiscale Data Condensation [2] approach with hamming dist...
متن کاملProbability Density Estimation from Optimally Condensed Data Samples
The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðNÞ optimizatio...
متن کاملNumerical Simulation of a Hybrid Nanocomposite Containing Ca-CO3 and Short Glass Fibers Subjected to Tensile Loading
The tensile properties of multiscale, hybrid, thermoplastic-based nanocomposites reinforced with nano-CaCO3 particles and micro–short glass fibers (SGF) were predicted by a two-step, three-dimensionalmodel using ANSYS finite element (FE) software. Cylindrical and cuboid representative volume elements were generated to obtain the effective behavior of the multiscale hybrid composites. In the fir...
متن کاملA variational multiscale method based on bubble functions for convection-dominated convection-diffusion equation
This work presents a variational multiscale method based on polynomial bubble functions as subgrid scale and a numerical implementation based on two local Gauss integrations. This method can be implemented easily and efficiently for the convection-dominated problem. Static condensation of the bubbles suggests the stability of the method and we establish its global convergence. Representative nu...
متن کاملErratum: Bose-Einstein Condensation beyond Mean Field: Many-Body Bound State of Periodic Microstructure
This is a correction to the author’s article [Multiscale Model. Simul., 10 (2012), pp. 383–417].
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 24 شماره
صفحات -
تاریخ انتشار 2002